{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### k近邻算法\n",
    "\n",
    "给定一个训练数据集，对于新的输入实例，根据其k个最近邻的训练实例的类别，通过多数表决等方式进行预测：在训练数据集中找到与该实例最邻近的k个实例，这k个实例的多数属于某个类，就把该输入实例分为这个类。\n",
    "\n",
    "### 模型\n",
    "\n",
    "k近邻法使用的模型实际上对应特征空间的划分，模型由三个基本要素--距离度量，k值的选择和分类决策规则决定\n",
    "\n",
    "![](https://raw.githubusercontent.com/liuzhaoo/markdown_pics/master/img/kjinlin.png)\n",
    "\n",
    "特征空间中，对每个训练实例点$x_i$,所有距离该点比其他点更近的点组成的区域叫做单元（cell）每个训练实例点拥有一个单元，所有训练实例点的单元构成对特征空间的一个划分。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 距离度量\n",
    "使用欧氏距离\n",
    "\n",
    "设特征空间$x$是$n$维实数向量空间 ，$x_{i}, x_{j} \\in \\mathcal{X}$,$x_{i}=\\left(x_{i}^{(1)}, x_{i}^{(2)}, \\cdots, x_{i}^{(n)}\\right)^{\\mathrm{T}}$,$x_{j}=\\left(x_{j}^{(1)}, x_{j}^{(2)}, \\cdots, x_{j}^{(n)}\\right)^{\\mathrm{T}}$ ，则：$x_i$,$x_j$的$L_p$距离定义为:\n",
    "\n",
    "$$L_{p}\\left(x_{i}, x_{j}\\right)=\\left(\\sum_{l=1}^{n}\\left|x_{i}^{(l)}-x_{j}^{(l)}\\right|^{p}\\right)^{\\frac{1}{p}}$$\n",
    "\n",
    "p = 2时为欧式距离\n",
    "#### k值的选择\n",
    "\n",
    "k值较小，近似误差会减小，估计误差会增大。容易受近邻点噪声的影响，也就是容易发生过拟合。k值较大时，与输入实例较远的训练实例也会对预测起作用，容易发生错误，这时的整体的模型简单。\n",
    "\n",
    "#### 分类决策规则\n",
    "\n",
    "k近邻法中的分类决策规则往往是多数表决，即由输入实例的k个临近的训练实例中的多数类决定输入实例的类"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 距离计算函数\n",
    "\n",
    "import math\n",
    "from itertools import combinations\n",
    "\n",
    "def L(x,y,p=2):\n",
    "    if len(x) == len(y) and len(x) > 1:\n",
    "        sum = 0\n",
    "        for i in range(len(x)):\n",
    "            sum += math.pow(abs(x[i]-y[i]),p)\n",
    "        return math.pow(sum,1/p)\n",
    "    \n",
    "    else:\n",
    "        return 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**例3.1**  已知二维空间的三个点x1 x2 x3  试求在p取不同值时，$L_p$ 距离下x1 的最近邻点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p=1时，x1与[5, 1]间的距离为4.0\n",
      "p=1时，x1与[4, 4]间的距离为6.0\n",
      "p=2时，x1与[5, 1]间的距离为4.0\n",
      "p=2时，x1与[4, 4]间的距离为4.242640687119285\n",
      "p=3时，x1与[5, 1]间的距离为3.9999999999999996\n",
      "p=3时，x1与[4, 4]间的距离为3.7797631496846193\n",
      "p=4时，x1与[5, 1]间的距离为4.0\n",
      "p=4时，x1与[4, 4]间的距离为3.5676213450081633\n"
     ]
    }
   ],
   "source": [
    "x1 = [1,1]\n",
    "x2 = [5, 1]\n",
    "x3 = [4, 4]\n",
    "\n",
    "for i in range(1,5):\n",
    "    for c in [x2,x3]:\n",
    "        r = L(x1,c,i)\n",
    "        print('p={}时，x1与{}间的距离为{}'.format(i,c,r))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
